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Abstract 

Many real-world networks display a natural bipartite structure. It is necessary 
and important to study the bipartite networks by using the bipartite structure of 
the data. Here we propose a modification of the clustering coefficient given by 
the fraction of cycles with size four in bipartite networks. Then we compare the 
two definitions in a special graph, and the results show that the modification one 
is better to character the network. Next we define a edge-clustering coefficient of 
bipartite networks to detect the community structure in original bipartite networks. 
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1 Introduction 

In recent years, as more and more systems in many different fields can be depicted as 
complex networks, the research in complex networks has been gradually becoming an 
important issue in the study of complexity [H [H [3]. A network is composed of a set 
of vertices and edges which represent the relationship between two nodes. Examples 
include WWW, internet, food webs, biochemical networks, social networks, and so 
on[ll[5l[6l[71[8l[9]. The research in networks not only raises new concepts and methods, 
but also helps us understand complex systems. 

Many real-world networks display a natural bipartite structure, such as the actors- 
films network|iOj, the papers-scientists network [11^ [T2| [T3] and so on. In bipartite 
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networks, there are two kinds of nodes called top nodes and bottom nodes. The edges 
only connect a pair of vertices belongs to different sets. When we want to investigate 
some properties of them, we often project them into one-mode networks which are also 
called classical networks first. However, given the one-mode network of a bipartite 
graph, it generally loses some information of the original bipartite network, brings an 
inflation of the number of edges and other drawbacks caused by proiection[14j. We 
believe that it will affect the properties especially the community structure of the 
networks. So we will pay more attention to study the community structure and other 
properties of the original bipartite networks, and develop some methods for detecting 
community structure in the original bipartite networks. 

Because of the drawbacks of projection, many authors try to analyze the networks 
by using the bipartite structure of the data. Some notions and properties, which are 
investigated in original bipartite networks, are also introduced, such as clustering [JH 
[H], overlap [25], betweenness [M], and others pi [M [26l [T5l [27| • 

The outline of this article is as follows. In Section [51 clustering coefficient, as one 
of the most important properties in classical graphs, also attracts us much attention in 
bipartite networks research. We propose a definition of clustering coefficient based on 
the study in[15j. Then we use it to observe the clustering coefficient of two real- world 
networks. In Section [3l we use an algorithm based on the clustering coefficient of links 
to detect the community structure of original generated bipartite networks. In Section 
[4] we give some concluding remarks. 

2 The clustering coefficient of bipartite networks 

The clustering coefficient C3 is one of the most important properties in classical net- 
works. It defines the fraction of the number of observed triangles to all possible tri- 
angles in networks. It can be used to characterize the small- world networks [lOj. un- 
derstand the synchronization in scale- free networks [16], and analyze networks of social 
relationships ^TTJ [18]. Refer to one-mode networks, the clustering coefficient of bipartite 
networks should attract us much attentions [15 j. 

A bipartite network consists of two different kinds of nodes. The links only can exist 
between two nodes which are from two distinct sets. Many real- world networks display 
a natural bipartite structure, such as the actors-films network, the papers-scientists 
networks and so on. The clustering coefficient of classical graphs measures the density 
of triangles. However, as the definition of bipartite networks, the triangle can not 
be formed in it. The basic clique in bipartite networks is a square. The clustering 
coefficient C4 should quantify the density of squares similar as the clustering coefficient 
C3. In social language, it calculates the probability of that my friends have common 
friends except me. Some other definitions of the clustering coefficient in bipartite 
networks have been proposed [24[ I27j . In this paper, we present a new definition based 
on the one mentioned in [15]. 

In [15], the clustering coefficient is defined as the fraction of the number of observed 
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squares to the total number of possible squares in the graphs. For a given node i, the 
number of observed squares is given by the number of common neighbors among its 
neighbors, while the total number of possible squares is given by the sum over each 
pair of neighbors of the product between their degrees, after subtracting the common 
node i and an additional one if they are connected. The equation is: 



(1) 



where m and n are a pair of neighbors of node i, and qimn is the number of squares 
which include these three nodes, rjimn — 1 + Qimn ~l" ^mn with 9mn — 1 if neighbors m 
and n are connected with each other and otherwise. 

We thought that there is a drawback of considering the total number of possible 
squares. The denominator of equation [1] should be changed into {km — rjimn) + (kn ~ 
Vimn) + Qimn- The equation is corrected as 
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where the representation of each parameter is the same as above. Here we give an 
example to show that why we do this change. In Figure [1] (a), considering the node 
m and n, which are the first neighborhoods of node i. It has Qimm = 1 square and 
km = 4:, kn = 3, 6mn = 0. The denominator of equation [T] equals to 3 in this case. It 
is not in accord with what we see in figure. In figure ??, there should be 4 possible 
squares as iman, imbn, imdn and imcn. Our definition of equation [2] gives a better 
answer. We also use these two definitions to calculate the clustering coefficient of a 
special graph (shown in Figure ?? (b)). The results are shown in table 1. Taking 
the connections among vertices and the results given by two equations into account, we 
consider that the denominator definition of equation [2] gives a better answer to compute 
the clustering coefficient C4. Because the distinct connections of each node would cause 
different properties of each node, especially the clustering coefficient. 

Table 1 shows the clustering coefficient of each node in figure [T] obtained by equation 
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0.3333 
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In order to complete the comparison between two definitions of the clustering co- 
efficient shown in equation [1] and [21 we need to choose a lot of real bipartite-networks 
database. The first one is the Econophysicists bipartite network built by ourselves, 
which is composed of 818 authors and 777 papers. A books-readers database obtained 
from Beijing Normal University library during one semester, with 17593 readers and 
91750 books. The analysis results gotten by equation [1] and [2] are displayed in table 2 
and 3. 
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(a) ^6 

(b) 



Figure 1: (a) An example to show equation [2] is better, (b) A graph consists of six 
nodes 

Table 2 displays the clustering coefficient of authors and papers in the 
Econophysicists bipartite network obtained by equation [1] and [2l 





Authors 


Papers 


eq 1 


0.18923 


0.14353 


eq 2 


0.305 


0.16916 



Table 3 shows the clustering coefficient of books and readers in the books-readers 
network obtained by equation [1] and [2l 





Books 


Readers 


eq 1 


0.00063 


0.00321 


eq 2 


0.00449 


0.00632 



3 The community structure of bipartite graphs 

Different metrics of connections strength among vertices form the community structure. 
Community structure is the groups of network vertices. Within groups there are dense 
internal links among nodes, but between groups nodes loosely connected to the rest of 
the network [20]. It is one of the most important characters to understand the functional 
properties of complex structures. Recent empirical studies on networks display that 
there are communities in most social and biology networks [201 I21j . This finding is 
very significant to understand network structure. Taking collaboration network of 
jazz musicians for an example, the analysis reveals the presence of communities which 
have a strong correlation with the recording location of the bands, and also shows 
the presence of racial segregation between the musicians[5]. In food web, communities 
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Figure 2: A schematic representation of a bipartite network with community structure. 
There are three communities of dense internal hnks (sohd hues) , with sparse connections 
(dot hues) between them 

reveal the subsystem of ecosystem[6]. Email network can be divided into departmental 
groups whose work is distinct and the communities reveal organization structure or the 
results reveal the self-organization of the network into a state where the distribution 
of community sizes is self-similar [TJ [8]. The deep research in community structure will 
make us comprehend and analyze the characteristic of systems better. 

All above works are done in one-mode networks. However, many real-world net- 
works display a natural bipartite structure, such as the actors- films network, the papers- 
scientists networks and so on. When we want to investigate the community structure 
of them, we often project it into one-mode network first. We believe that the pro- 
jection will bring some drawbacks and affect the properties especially the community 
structures of the networks. So we should pay more attention to analyze it in original 
bipartite graphs. 

Similar to classical networks, community structure of bipartite networks is the 
groups of nodes. Within groups there are dense internal links among two different sets 
of nodes, but between groups nodes loosely connected to ones belonged to the other set 
of the network (shown in Fig. [2]). To the one-mode networks, Filippo Radicchi etal 
have proposed a divisive algorithm [22j. They considered the edge-clustering coefficient, 
defined in analogy with the usual node-clustering coefficient. Here we also can define 
the edge-clustering coefficient of bipartite networks, as the number of squares to which 
a given edge belongs, divided by the number of squares that might potentially include 
it. For the edge-connecting top node i to bottom node j, the edge-clustering coefficient 
is 
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Sm=l Sn=l ^ijmn + (X^m=l ^"i 1) + (X^n=l ^" Sm=l X]n=l ^iimn 

(3) 

where m is a neighbor of node i, and n is one of j's neighbors, ^jjmn = 1 if neighbors 
m and n are connected with each other and otherwise. Oijmn is opposite to qijmn- 
km is the degree of node m. This algorithm works as the GN algorithm, however, the 
edge with the smallest value of Cij should be cut at each step. 

Similar to test the performance of a method in one-mode networks, here we apply 
the edge-clustering coefficient algorithm to a computer-generated bipartite networks. 
The generated network is made up of 64 top nodes and 64 bottom nodes. All the 
nodes are divided into four separate communities. There are 16 top nodes and 16 
bottom nodes in every community. Vertices are assigned to groups and are randomly 
connected to vertices of the same group by an average of < kintra > links and to 
vertices of different groups by an average of < kinter > links. The degree of all vertices 
is fixed, namely < kintra > + < kinter >= 16. It is obvious that with < kinter > 
increasing, the communities become more diffuse and it becomes more difficult to detect 
the communities. 

In the following numerical investigations, we get 20 realizations of computer- generated 
bipartite networks under the same condition. Based on these results, using the similar- 
ity function S which has been mentioned in [23], comparing each divided groups with 
presumed community structure. We get the accuracy of our algorithm (shown in Fig. 
[31(a) and[a(b)). 

Here is a bipartite network, which includes 6 top nodes and 5 bottom nodes (shown 
in Fig. [4] (a)). According to the definition of community structure of bipartite net- 
works which we mentioned in the beginning of this section. Fig. [J] (a) is consist of three 
communities, as {{A},{B,C,a,b},{D,E,F,d,e}}. As before, when we get a bipartite net- 
work, first we often project it into a one-mode network. Here we project this bipartite 
network into top nodes with weights (shown in Fig. [4] (b)). It is divided into two parts 
by using the WEO algorithm [28]. as {{A,B,C},{D,E,F}}. It is different from what is 
shown in the graph. Next we use our algorithm to analyze the community structure of 
this bipartite network, the result we get is same as what we see from the graph. 

4 Conclusions 

In this paper, we proposed a modification of the clustering coefficient given by the frac- 
tion of cycles with size four in bipartite networks based on the work of Pedro G. Lind et al 
[15] . We use these two definitions to calculate the clustering coefficient of a special 
graph, and got that there is difference between two results. We considered that the one 
we defined gives a better answer with the distinct connections of each node of graph. 
Then we discussed the community structure of bipartite graphs, and defined an algo- 
rithm based on the edge-clustering coefficient of bipartite networks. In this way, we 
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Figure 3: Our algorithm performance as applied to computer-generated bipartite net- 
works with n = 128 and four communities of 16 top nodes and 16 bottom nodes each. 
Total average degree is fixed to 16. (a) is the accuracy of top nodes using computer- 
generated bipartite networks with presumed community structure, (b) is the accuracy 
of bottom nodes using computer-generated bipartite networks with presumed commu- 
nity structure. The x-axis is the average of connections between nodes in different 

groups < kinter >■ 




Figure 4: (a) It is composed of 6 top nodes and 5 bottom nodes. (b)It is the projection 
on top nodes of (a) 



avoided the drawbacks and effects brought by the projection to the analysis of commu- 
nity structure, just as the example we gave in the end of section [3l At last, we tested 
the accuracy of this algorithm in the computer-generated bipartite networks. We found 
that when community structure is well defined by topological linkage, it works well. 
But this algorithm only considered the nodes which connected more than twice. This 
needs to be modified in the future. 
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